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(57) Abstract 

A nucleic acid fragment comprising a portion of at 
least 17 contiguous nucleotide bases which portion has a se- 
quence the same as, or homologous to a portion of corre- 
sponding length of the sequence of the coding strand as set 
out in Fig. 1 or the same as, or homologous to a portion of 
corresponding length of the sequence complementary to the 
sequence of the coding strand set out in Fig. 1. 
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MUCIN NPGLEOTIDES 
The present invention reflates to nucleGtide 
fragments , polypeptides and antibodies and their use in 
medical treatment and diagnosis* 
5 In International Patent Application no. WO-A-88/05054 

there is disclosed a tandem repeat sequence contained in 
the human polymorphic epithelial mucin (HPEM) gene and 
nucleotide probes , polypeptides, antibodies and 
antibody ••producing cells which are useful in the diagnosis 
10 and treatment of adenocarcinomas such as breast cancer. 

The present inventors have now elucidated the 
nucleotide base sequence of the gene in the region 5 • of 
the tandem repeat sequence (unless the context implies 
otherwise, directions such as "5 • " 6r "3 • " , "upstream" or 
15 ••downstream" used herein refer to the non-template strand 

of the genomic DNA or fragments thereof) . The complete 
sequence of the 1763 nucleotide bases of the non-template 
strand upstream of and including the first Smal 
restriction site in the tandem repeat is set out in Fig. 
20 1. The sequence of 1575 nucleotide bases of the non- 
template strand upstream of and including the first Smal 
restriction site in the tandem repeat as set out in Fig. 3 
has been extended and some parts have been corrected in 
the light of repeat experiments . The template strand has 
25 a complementary sequence and it is this strand which is 
transcribed into RNA during expression of the gene 
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product. 

In addition to conventional transcriptional and 
translational start sites and intron splicing sites , this 
sequence Gontains a number of features which may be 
5 important in the diagnosis and therapy of cancers and in 

expression of proteins from recombinant vectors. These 
features will be described below. The amino acid sequence 
corresponding to the translated portions of this 
nucleotide sequence gives rise to peptides and thence to 

10 antibodies and antibody-producing cells which may also be 
useful in such diagnosis and treatment. 

In one aspect the present invention provides a 
nucleic acid fragment comprising a portion of at least 17 
contiguous nucleotide bases which portion has a sequence 

15 the same as, or homologous to a portion of corresponding 
length of the sequence of the coding strand as set out in 
Fig. 1 or the same as, or homologous to a portion of 
corresponding length of the sequence complementary to the 
sequence of the coding strand set out in Fig. 1. 

20 As used herein the term "fragment" is intended to 

include restriction endonuclease-generated nucleic acid 
molecules and synthetic oligonucleotides. 

The nucleic acid fragments of the invention may be 
single-stranded or double-stranded and they may be RNA or 

25 DNA fragments. Single stranded fragments may be "plus" or 
coding strands having the sequence of Fig. l or a part 
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thereof or a sequence homologous thereto. Alternatively 
the single stranded fragments may be •■minus" or npn^qoding 
strands having a sequence complementary to the sequence of 
Fig. 1 or a part thereof or a seqpience homologous thereto. 
5 Double stranded fragments contain a complementary pair of 
strands/ (ie. one plus strand and one minus strand) . 

RNA fragments according to the inyention wil^^^ of 
course, cohtairi xiridylic acid ("U"y^^ in place of 

the deoxythymidy lie acid residues C" of the coding 
10 (non-template) strand set out in Fig. 1 or, if 

Gomplementary to the sequence of the coding strand^ they 
will contain U residues in positions complementary to the 
adenylic acid ( "A" ) residues in the coding strand set out 
in Fig. 1. 

15 Preferaibly the nucleic acid fragments of the 

invention are double-stranded DNA fragments. 
Single-stranded nucleic acid fragments of the invention 
are at least 17 nucleotide bases; in leh^pth. 
Double-stranded nucleic acid f iragmerits of the invention 

20 are at least 17 nucleotide base pairs in length. 

Preferably the fragments are at least 20 bases or base 
pairs in length, more preferably at least 25 bases or base 
pairs and yet more preferably at least 50 bases or base 
pairs in length . 

25 Statistically it ^is almost certain that a 17 

nucleotide base sequence will be unique so that any 
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nucleic acid f ragxaent having a contiguous portion of 17 
nucleotides of a sequence identical to a portion of 
corresponding len^ as set out in 

Fig. 1, or the same as the non-coding strand complementary 
5 to the sequence of Fig • I , will be new . Fragments; 

according to the inve^ only 17 nucleotides 

or nucleotide l)ases in length h^ a sequence the same as, 
or complementary to, that siet out in Pig Longer 
fragments of the invention may have a sequence which is 

10 homologous to a corresponding portion of the sequence for 
the coding strand as set out in Fig. 1 or to the 
compl^entary noh^coding strand. 

Preferably nucleic acid fragments according to the 
invention have^^^^^ sequence homology with a 

15 corresponding portion of tJie sequence of Fig. 1 or the 
complementary non^coding strand, for instance 80 or 85%, 
more preferably 90 or even 95% homology. Differences may 
arise throug^i deletions, insertions or substitutions. 
In addition to containing 

20 same as the seqpience of the coding strand in Fig or 
complementary non^coding strand, the nucleic acid 
fragments of the invention may include sequences 
completely uhrelated to that in Fig. 1. 

Particular features of interest within the coding 

25 strand in Fig. 1 are set out in Tables 1 to 3 below: 
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TABLE 1: Signal Sequences 





Location* 


Seguehcei in PEM 


Significance 


: 5 


1-2 




CG 


transcriptional start site 




73-75 




ATG 


translational start signal 




131-132 




GT 


start of first intron 




631-632 




AG 


end Of first intron 


10 


100-130 
and 

633-637 


} 
} 
} 


TTCCTGCTGCTGCT- 
CCTGACAGTGCTTA- 
CAG. . .TTGTT 


Signal sequence, interrupted 
by first intron (first intron 
indicated by". . .") . 




955-9160 




CCCGGG 


Smal site at start of tandem 
repeat 



15 Footnotes to Tables 1 and 2 



+ In the consensus sequences : R is A or G 

N is A, C, G or T 
W is A or T 
X is 

20 Y is C or T 

* Locations are of the 5* base of the indicated PEM 
sequence numbered as in Fig. 1. 
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TABLE 2 : Regulatory elements within the 5* flanking seguence 



Regui3tory element Consensus Sequence^ Sequence in PEM Location* 



SPl GGGCGG GGGGGG -727 

GGGCGG -397 

GGGGGG -94 

GGGCGGGCGGGCGGG -54 

SV40 enhancer element 

a ATGTGTGT CTGTGGGT -562 

b GCATGCAT GCCTGCCT +25 

c GTGGATAG GTGGAGAG -702 

AP-1 CTGACTGA GTGACCAC -739 

G A CTGCTTCA -418 

GTGCCTAG -61 

. CTGCCTGA +27 

AP-2 CCCCAGGC ACCCAGGC -597 

GG CACCGGGC +77 

OTl/CTF TTGGCrmWAGCCAA TTGGCTTTCTCCAA -618 

Glucocorticoid regulatory element: 

Core sequence TGTTCT TGTTCT +38 

TGTTCC -321 

Consensus sequence GGTAGANNNTGTTCT GCCTGAATCTGTTCT +29 

AGCTGGCTTTGTTCC -330 

CACCC factor CACCC CACCC +54 

; CACCC- ■ +84- 



Progesterone receptor ATTCCTCTGT ACTGGTCTCC -802 

consensus sequence ACTCCTCCTT -626 

ATTTCTCGGC -432 

Estrogen consensiis 

sequence GGTGANNMTGACC GCTGCCGGTGACC -746 

RNA Polymerase III 

Box A RRYNNARYXGG GACCTAGGTGG -335 

AGTGGAGTGGG -388 

Box B GWTCRAKHC GTTCCAGAC -260 

Enhancer sequences: 

Interf eron-6 seq GGAAATTC GGAAATTTCTTCC -642 

CfPT enhancer GGAAAGTCCCGTT GGAAAGTCCGGCT -585 
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The sequence in Fig* 1 also includes two sites dccurring in 
the promoter region and in the first intr on having 70 to 
80% homology with the mammary consensus , sequence (Rosen , 
J.M. in "Thet Mammary Gland, Development/ Regulation and 
Function", Ed. Nevill, M.C, and Daniel, C.W. Plenum Press, 
301*322) . These sites are set biit in Table 3 below: 



TABLE 3 



10 



LoGat:ion 


Sequence 




: *** , :.,.,,,*,*■;...;,,■:, 


-289 to -274 


A6GCTAAAACTAGAGC 




■ * ' ** ** ' 


+230 to +245 


GTAAGAATTGGAGACi^ 


Consensus 


RGAAGRAAANT6GACA 



Positions are numbered in accordance with Fig. 1 . 
^ indicates a inismatch with the consensus sequence . 
In the consensus sequence: -- R is A or G. 
20 N is A, C, G or T. 



Preferred fragments according to the present 
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invention include the transcriptional and translational start 
signals, "TATAR" box and at least one of the regulatory 
elements Ctt^anscriptipn factor binding sites) set out in 
Table 2 above. More preferably these fragments contain 2 or 
5 more, for instancy 3, 4 or 5 of the regulatory elements in 
addition to the TiRlTAA box or even all of the regulatory 
elements set out in Table 2v Those fragments containing more 
than one of the regulatory elements of Table 2 preferably 
also preserve tiie^^r^^ spaeings of those sites from one 

10 another and from the TATAA box and transcriptional and 
translational start signals. 

Other preferred fragments of the invention contain at 
least one of the regions homologous to the mammary consensus 
seguenGes as set out in Table 3 • Preferably these fragments 
IS contain both of the regions having homology with the mammary 
consensus sequences as set out in Table 3. Those fragments 
containing both regions having homology with the ms^ 
consensus sequence preferab relative 
spacing of those re 1, frpm one 

20 another ind from the TATAA box and transcriptional and 
translational start signals. 

Yet further preferred fragments according to the 
invention comprise the TATAA box, the transcriptional and 
translational start signals, at least one and preferably two 
25 or more of the regulatory elements as set out in Table 2 and 
at least one and preferably both of the regions having 
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homology with the mammary consensus sequence as set out in 
Table 3. Yet more preferably these fragments also preserve 
the relative spacing of the features from Tables 1, 2 and 3. 
Particularly preferred fragments according to the invention 
5 comprise the sequence upstream of the TATAA box as set out 
in Fig together with, and downstream there^ 
transcriptional and translational start signals and a 
polypeptide coding sequence in correct reading frame 
register with the promoter sequences and the TATAA box, 
10 tfahscriptional and translational start signals. The coding 
sequence may encode a part or parts of the polypeptide 
encoded by the mucin gene, for instance a part or parts 
thereof other than the tandem repeat sequence, or 
polypeptides unrelated to that encoded by the mucin gene. 
15 Other particularly preferred fragments according to 

the present invention comprise promoter sequences, a TATAA 
box, transcriptional and translational start signals and, 
downstream thereof and in correct reading frame register 
therewith a coding sec[uence corresponding to a portion of 
20 the mucin gene, for instance corresponding to the first exon 
(corresponding to bases (1 to 130 of Fig.l.) or a part 
thereof and/or the second exon (corresponding to bases 633 
onwards in Fig. 1.) or a part thereof, for instance a part 
thereof other than the tandem repeat sequence as set out in 
25 WO-A-88/05054. 

In an especially preferred aspect the fragments 
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contain (1) the first 2 S bases (bases 1 to 26 of Fig, 1) or 
( ii ) the whole of the first exon (bases 1 to 130 of Fig * 1 . ) 
and/or (iii) the splicing/ ligating sites for the first intron 
set out in Table 1 and a non-coding sequence between these 
5 sites. The non-coding sequence may be the same as or 

different to the secjuenpe of the first intron e^s shown in 
Fig. 1* Preferably it is the same. 

Other preferred fragments of the invention comprise 
at least a portion of the first intron (bases 231 to 632 of 
10 Fig. 1). Further preferred fragments of the invention 
comprise at least a portion of the 5 '-flanking sequence 
upstream of base -423 of Fig. 1. 

Other preferred fragments of the invention comprise a 
portion of the sequence of Fig. 1 corresponding to a portion 
15 of the sequence of Fig. 3. 

Further preferred fragments of the invention comprise 
a combination of any two or more of the foregoing preferred 
features. 

Fragments according to the present invention 
20 containing functional coding sequences for a least a part of 
the first or second etxons set out in Fig. 1 are useful in the 
production of polypeptides Gorresponding to a part or all of 
the mucin gene product. Such polypeptides are, in turn 
useful as immunogenic agents for instance in active 
25 immunisation against Human Polymorphic Epithethial Mucin 
(HPEM) for the prpphy lactic or therapeutic ^^te 
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cancers or raising antibodies for use in passive immunisation 
and diagnosis of cancers. For use in such methods the 
fragment, which codes for a polypeptide chain substantially 
identical to a portion of the mucin core protein, may be 
5 extended at either or both the 5' and 2" ends with further 
coding or non-coding nUcleic^ 

regulatory and promoter seig^iences, marker sequences, and 
splicing or liga ting sites. Coding segilences may code for 
other portions of the mucin core protein chain (for instance, 
10 other than the tandem repeat) or for other polypeptide 

chains* The fragment according to the invention, together 
with any necessary or desirable flanking sequences is 
inserted, in an appropriate open reading frame register, into 
a suitable vector such as a plasmid, or cosmid or a viral 
15 genome (for instance vaccinia virus genome) and is then 
expressed as a polypeptide product by conventional 
techniques. In one aspect the polypeptide product may be 
produced by culturing appropriate cells transformed with a 
vector, harvested and used as an itamunbgen to ihdiace active 
20 immunity against the mucin core protein [Tartaglia et al. , 
Tjbtggfr, 6, 43: (1988)]. 

Fragments according to the present invention 
incorporating regulatory elements of Table 2 and/ or mammary 
consensus sequences of Table 3 may be used in securing 
25 tissue-specific expression of functional coding sequences in 
appropriate reading frame register downstream of the 
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regulatory elements and/or associataed with the laammary 
consensus sequences. Such fragments may therefore be used 
to express parts or the whole of the mucin gene or any other 
coding sequence in cells of epithelial origin. Applications 
5 of this ZLce in therapy and immunisation where such fragments 
and asspciated coding sequences are administered to patients 
such that ttie coding secpaence will be expressed in 
epithelial tissues leading to a therapeutic effect or an 
immune reaction by the patient against the polyp 

10 The fragments may be presented as inserts in a vector 

such as viral genomic nucleic acid and introduced into the 
patients by inoculation of the vector for instance as a 
modified virus. The vector then directs expression of the 
polypeptide Jji vj^ and t^ turn serves as a therapeutic 

15 agent or as an immuhogen to induce active immunity against 
the polypeptide. This strategy may be adopted, for 
instance, to secure expression of polypeptides encoded by 
the HPEM gene for treatme^ 

adenocarcinomas such as breast cancer or to secure tissue 
2 0 specific expression of other peptides under control of the 
regulatory sequences 6f Table 1, for instance by 
admihistration of a modified vaccinia virus containing the 
fragment and coding sequences in its genomic DNA. RNA 
fragments of the invention may similarly be used by 
25 administratibn via a retroviral vector. Selection of tissue 
specific virus vectors to carry the fragments of the 
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invention and coding sequences will further restrict 
expression of the polypeptide to desired target tissues. 

Fragfmerits of the invention may also be used to 
control expression of oncogenic proteins in experimental 
5 transgenic animals. Thus, for instance, a transgenic mouse 
having an oncogene such as ras, erbB-2 or int 2 expressed 
under cpntrol of the present tissue specific fragm may 
develop breast tumours and be useful in testing diagnostic 
agents such as tumour localisation and imaging agents and in 
10 testing therapeutic agents such as immunotoxins v 

Nucleic acid fragments according to the invention are also 
useful as hybridisation probes for detecting the presence of 
DNA or RNA of corresponding sequence in a sample. For use 
as probes fragments are pre feraJaly labelled with a 
15 detectable label such as a radionuclide> enzyme label, 
fluorescent label or other conventional directly or 
indirectly detectable labels. For soine applications, the 
probes may be bound to a solid support. Labelling of the 
probes may be achieved by conventional methods such as set 
20 out in Matthews £t al. , Anal . Biochem . 
169; 1-25 (1988). 

In further aspects, the present invention provides 
cloning vectors and e3cpression vectors containing fragments 
according to the present invention. The vectors may be, for 
25 instance, plasmids, GOsmids or viral genomic DNA. The 
present invention further provides host cells containing 
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such cloning and expression vectors/ for instance epithelial 
cells transfoinned with functional expression vectors 
containing expressible fragments according to the invention. 

The invention fxirther provides nucl^ fragments 
5 which encode polypeptides as defined below. Such fragments 
may be fragments as hereinbefore defined. However^ in view 
of the redundancy of the^^ code, nuqleic acid 

sequences which differ slightly or siibstantiall^^ the 
sequence of Fig; 2 may nevertheless encode the Scime 

10 polypeptide. 

The nucleic acid fragments of the invexvtion may be 
produced de novo by conventional nucleic acid synthesis 
techniques or obtained from human epithelial cells by 
conventional methods. Huynh et al. / "DNA Cloning: A 

15 Practical Approach" Glover, D.M. (Ed) IRL, Oxford, Vol 1, 
pp49-78 (1985). 

The invention therefore also provides probes/ vectors and 
transformed cells comprising nucleic acid fragments as 
hereinbefore defined for use in methods of treatment of the 

20 hiiman or animal body by surgery or therapy and in diagnostic 
methods practiced on the human or animal body and for use in 
the preparation of medicaments for use in such methods. The 
invention also provides methods for treatment of the human 
or animal body by surgery or therapy and diagnostic methods 

25 practiced in vivo as well as ex vivo and in vitro which 
comprise administeiring s\ich fragments, probes, vectors or 
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transformed cells in effective non-toxic amount to a hiiinan 
or other mammal lit need thereof. 

Processes for producing fragments according to the 
invention and probes , vectors and transformed cells 
5 containing them and processes for expressing polypeptides 

encoded by, or under the regulatory control of, fragments of 
the invention also form aspects of the invention. 
The invention further provides a polypeptide comprising a 
sequence of at least 5 amino acid residues encoded by the 
10 coding portion of the DNA sequence as indicated in Fig. 2. 
Polypeptides according to the invention preferably have a 
sequence of at least 10 residues, for instance at least 15, 
more preferably 20 or more residues and most preferably all 
the residues shown in Fig. 2. 
15 The polypeptide may additionally comprise N-terminal 

and/or C-terminal sequences not encoded by the DNA sequence 
indicated by Fig. 2. 

Polypeptides of the invention containing more than 5 
amino acid residues encoded by the DNA sequence in Fig. 2. . . ... 

20 may include minor variations by way of substitution, 

deletion or insertion of individual amino acid residues. 
Preferably such polypeptides differ at not more than 20% 
preferably not more than 10% and most preferably not more 
than 5% of residues in a contiguous portion corresponding to 
25 a portion of the sequence in Fig. 2. 

The invention further provides polypeptides as 
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defined above modified by addition of a linkage sugar such 
as N-acetyl galactosamine on serine and/or threonine 
residues and polypeptides modified by addition of 
oligpsaccharide moieties to N-acetyl galactosamine or via 
5 other linkage sugars. Optionally modified polypeptides 
linked to carrier proteins such as keyhole limpet 
haemocyanin, albtamen or thyroglobulin are also within the 
invention. 

Polypeptides accprding to the invention may be 
10 prGduped de novo by synthetic methods or by expression of 

the appropriate DNA fragments described above by recombinant 
DNA techniques and expressed without glycosylation in human 
pr non-human cells. Alternatively they may be obtained by 
deglycosylating native hiiman mucin glycoprotein (which 
15 itself may be produced by isolation from samples of human 
tissue or body fluids or by expression and full processing 
in a human cell line) [Bxirchell gt al. , Cancer Research . 47; 
5467-5482, (1987) , Gendler et al. , P.N.A.S. . 84; 6060-6064, 

(1987) ] i and digestihg the core protein. The polypeptides 

■ ■ ■ . ■ - " ■ 4 ■■ " "■ 

20 of the invention are useful in active immunis;at ion of 

hxamans, for raising antibodies in animals for use in passive 
immuhisation, diagnostic tes;ts/ tiMour localisation and/ 
when used in conjunction with a cytotoxic agent, for tumour 
therapy. 

25 The invention further provides antibodies against 

any of the polypeptides described above. 
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As used hereafter the term "antibody" is intended to 
include polyclonal and monoclonal antibodies and fragments 
of antibodies bearing antigen binding sites such as the 
F(ab» ) 2 fragments as well as ^uch antibodies or fragments 
5 thereof which have been modified chemically or genetically 
in order to vary the amino acid r one or 

more polypeptide chains , to change the species specific 
and/or isotype specif ic regions and/dr to coni^ 
polypeptide chains from different sources. Especially in 
10 therapeutic applicatidns it may be appropriate to modify the 
antibody by coupling the Fab/ or complementarity- 
determining region thereof , to the Fc, or whole framework, 
region of antibodies derived from the species to be t 
(e.g. such that the Fab region of mouse monoclonal 
15 antibodies may be administered with a human Fc region to 
reduce immune response by a human patient) or in order to 
vary the isotype of the antibody (see EP-A-0 239 400) . Such 
antibodies may be obtained by conventional methods 
[Williams, Tibtech, 6:36, (1988) ] and are useful in 
20 diagnostic and therapeutic applications, such as passive 
immunisation, 

The term "antibodies" used herein is further intended 
to encompass antibody molecules or fragments thereof as 
defined above produced by recombinant DNA techniques as well 
25 as so-called "single domain antibodies" or "dAbs" such as 
are described by Ward, E.S* et al, , Nature , 341; 544-546 
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(1989) Which are produced in recombinant microorganisms, 
such as Escherichia coll . harboring expressible DNA 
sequences derived from the DNA encoding the variable domain 
of an imimunogibbulin heavy chain 1^ random mutation 
5 introduced , for ihstance, during polymerise chain react ion 
ainpiif icatibn ot^ 1^ 

produced by screening a library of such randomly mutated DNA 
sequences and selecting those which enable expression of 
polypeptides capable of specifically binding the 

10 polypeptides of the ixivention or HPEM core^ p 

Antibodies accbrding to the present invention react 
with HPEM core protein, especially as expressed by colon, 
lung, ovary and particularly breast carcinomas, but have 
reduced or no reaction with corresponding fully processed 

15 HPEM. In a particular aspect the antibodies react with HPEM 
core protein but not with fully processed HPEM glycoprotein 
as produced by the normal lactating human mammary gland. 

Antibodies according to the present invention 
preferably have no significant reaction with the mucin 

20 glycoproteins produced by pregnant or lactating mammary 
epithelial tissues but react with the mucin proteins 
expressed by mammary epithelial adenocarcinoma cells. These 
antibodies show a much reduced reaction with benign breast 
tumours and are therefore useful in diagnosis and 

25 localisation of breast qancer as well as in therapeutic 
methods. 
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Further uses of the antibodies include diagnostic 
tests of assays for detecting and/or assessing the severity 
o f br e a s t ^ col on , ovary and lung Ciancer s . 

The antibodies may be used for other purposes 
5 including screening cell cultures for the polypeptide 
expression product of the human mammary epilSielial 
gene^ or fragments thereof / particularly the nascent 
expression product. In this case the antibodies may 
conveniently be polyclonal or mohbclorial antibodies. 

10 The invention further provides antibodies linked to 

therapeutically or diagnostieally effective ligands. For 
therapeutic use of the antibodies the ligands are lethal 
agents to be delivered to cancerous breast or other tissue 
in order to incapacitate or kill transformed cells. Lethal 

15 agents include toxins, radioisotopes and "direct killing 
agents" such as components of complement as well as 
cytotoxic or other drugs. 

For diagnostic applications the antibodies may be 
linked to ligands such as solid supports and detectable 

20 labels such as enzyme labels, chromophores , fluorophores and 
radioisotopes and other directly or indirectly detectable 
labels . Preferably monoclonal antibodies are used in 
diagnosis. 

Antibodies according to the present invention may be 
25 produced by inpcuiation^of suitable animals with^^^^a 
polypeptide as hereinbefore described . Mpnpplonal 
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antibodies are produced by known methods, for instance by 
the method of Kohler & Mils te in [Nature, 256: 495-497 
(1975) ] by immortalising spleen cells from an animal 
inoculated with the mucin core protein or a fragment 
5 thereof, usually cell line 

(preferably a myeloma cell line) , of the same or a different 
species as the inoculated animal, followed by the 
appropriate cloning and screening steps* 

Antibody-producing cells obtained from animals 
10 inoculated with polypeptides of the invention and 

immortalised such cells form further aspects of the 
invention. 

The invention further provides polypeptides , 
antibodies and antibody producing cells, such as hybridomas, 

15 as hereinbefore defined for use in methods of surgery , 

therapy or diagnosis practiced on the human or animal body 
or for use in the production of medicaments for use in such 
methods • The invention also provides a method of treatment 
or diagnosis which comprises administering an effective 

20 non-toxic amount of a polypeptide or antibody as 

hereinbefore described to a hxman or animal in need thereof . 

Processes for producing polypeptides according to the 
invention whether by expression of nucleic acid fragments of 
the invention of otherwise, and for producing antibodies or 

25 fragments thereof and for producing antibody-producing cells 
such as immortalised cells, form further aspects of the 
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invention. 

The invention fvirther provides a diagnostic test or 
assay method comprising contacting a sample suspected to 
contain abnormal human mucin glycoproteins with an antibody 
5 as defined above . Such methods include tumour localisation 
involving administration to the patient of the antibody 
bearing detectable or administration of an antibody 

and, separately, simultaneously or sequentially in either 
order, administering a labelling entity capable of 

10 selectively binding the antibody or fragment thereof. 

Diagnostic test kits are provided for use in diagnostic 
tests or assays and comprise antibody and, optionally, 
suitable labels and other reagents and, especially for use 
in competitive assays, standard sera. 

15 The invention will now be illustrated with reference 

to the figures of the accompanying drawings in which: 



Fig^^ 1. shows the deoxynucleotide bas^ of the 1763 

bases upstream of and including the first Smal restriction 

20 site in the tandem repeat sequence of WO--A-88/ 05054 using 
the conventional symbols A, C, G and T for the bases of the 
non-template strand. The base sequence is arranged in 
blocks of ten . Untranscribed sequence is in lower case , 
transcribed sequence is in upper case. The SPl regulatory 

25 eleiaents (Tabl^ 2) , TATAA box, transcriptional and / 
translational start sites (Table 1) are underlined. 
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Fig. 2. shows the sequence of the non-^template strand 
coBunencing from the transcriptional start site, (residue 1 
in Fig> 1.) and excl^ sequence of the f irst intron 

(bases 131 to 632 of the seqpience in Fig,^ Fig. 2 also 

5 shows the predicted sequence of the polypeptide using the 
conventional 1 letter symbols for the aiaiho acid residues . 
Amino acid residues are lumbered down the left-hand side and 
nucleotide bases dowia the right hand side. The signal 
sequence is underlined. The sequences end at the first Smai 
10 site in the tandem repeat. 

Fig. 3» shows the deoxy nucleotide base sequence of the 1575 
bases upstream of and including the first Smal restriction 

site in the tandem repeat secjuence of WOr^A-88/05G5 using 
15 the conventional syiobdls A, G and T for the bases of the 

non-teinplate strand. The base secpaence is arranged in 

blocks of ten in non-coding regions. The exon sequences are 

shown in blocks of three and transleAed CO 

underlined. The start positions of exons 1 and 2, intron 1 
20 and the signal sequence for exon splicing are numbered and 

labelled. Other featxxres mentioned in Tables 1 and 2 are 

boxed. The sequence finishes with the first Smal site of 

the tandem repeat sequence. 

The present invent ion does not extend tp fragments, 
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polypeptides and antibodies or related materials sueh as 
vectors and cells, which are specif ically disGlosed in WO-A- 
88/05054 or WO-A-90/05142 , nor to the GDNA fragmeht whose 
sequence is indicated in Abe, M. et al/, in Biochemical and 
5 Biophysical Research Coininunications / 165 (2) ; 644^649 (1989). 



The invention will now be illustrated by the 
following Exainples: 

EXAMPLE 1 



In an attempt to obtain clones with 5» unique 
10 sequences , two gtlO libraries were screened with a probe 
for the tandem repeat. All the clones obtained lacked any 
non-repetitive sequence at the 5' terminus. Thus, a 
different strategy was adopted. To obtain 5 • sequence we 
synthesized the cDNA corresponding to the 5* end of breast 
15 cancer cell line transcript using anchored-polymerise chain 
reaction (ArPCR) . The A-PCR procedure [Loh, E. Y. et al . , 
Science / 243 ; 217-220^ (1989)] was used to synthesize cDNA 
corresiponding to the 5 • end of the transcript . For the 5 • 
end clones total RNA (5 Mg) prepared by the guanidinium 
20 isothiocyanate mel^iod [CJiirgvin, J.M. al . , Biochem . , 18 : 
5294-5299 (1979)] was used for first strand synthesis using 
a breast cancer cell liije XBT20) transcript with AMV-reverse 
transcriptase (Life Sciences) in k 40 /xl reaction mixture 
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[Okeiyaiaa, H. and Berg, P. , Mol. Cell. Biol. . 2: 161-170 
(1982) ] containing 1 fig of an oligonucleotide primer made to 
the tandem repeat (5 • CCAAGCTTGGAGCCCGGGGCCGGCCTGGTGTCCGG3 •) . 
The total RNA was subjected to reverse transcription, and 
5 the products were precipitated with spermine. A poly(dG) 
tail was introduced with terminal deoxy-transf erase (500 
U/ml, Pharmacia) . performed with Thermus 

aguaticus polyinerise (Perkin Elmer Getus) in 100 /il of the 
standeird btiffer supplied, llie primers included the tandem 

10 repeat primer and for the poly(dG) end, a mixture of the AN 
polyC primer (5 •GCATGCGCGCGGCCGCGGAGGCCCCCCCCCCCCCC3 •) and 
the AN primer (5 • GCATGCGCGGGGCGGGGGAGGGC3 • ) at a ratio of 
1:9. Following an initial denaturation at 94^G for 5 min, 
the reaction was annealed at 55^G for 2 min, extended at 

15 72*^C for 2»5 min and denatxired at 94*^0 for 1.5 min. 

Amplificat ion was performed for 30 cycles, and the product 
was precipitated with ethanol. The DNA was sequentially cut 
with Hindlll and SacII/ separ^ agarose Gel and 

the band of approximately 550 bp was purified onto DEAE 

20 membrane (Schleicher and Schuell) , ligated into pBS-SK'*" and 
transformed into bacteria XL-1 (Stratagene) . This plasmid 
will be referred to as pBS-5'Paf. All restriction enzymes 
used were obtained from New England Biol abs Inc., 
oligonucleotide primers and probes were synthesized on an 

25 Applied Biosystems 380B DNA syhthesizer. 

Four colonies w^re selected for sequencing, and the 
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sequences agreed with each other and with sequence obtained 
from genomic clones of the region. A Leader sequence of 72 
bp preceded the first ATG which was in-frame with the 
reading frame of the tandem repeat as previously determined 
5 (Fig. 1) , and the sequence preceding first ATG, CGAGCATGA, 
agrees with the Kozak consensus sec[uence (Kdzak, M. , Nucl > 
Acids. Res . - 12: 857-872 (1984). 

The primer extensibn technique was use to map 
precisely the position of the capsite. A 21 bp 
10 oligonucleotide primer (5 •AGAGTGGGTGCCGGGTGTGAT3') 

corresponding to nucleotides 73 to 93 ending at the A of ATG 
(Fig. 1) was end-labelled with [^-"^^PJATP (> 5000 Ci/mmol, 
Amersham International pic) using T4 poly-nucleotide kinase 
(Pharmacia) and precipitated three times with equal volumes 
15 of 4 M ammonium acetate to remove free [^-^-P] ATP from the 
kinased oligonucleotide . Labelled primer (1 x 10^ dpm at 1 
X 10*7 dpm/pmole) was annealed to 40 /ig of total BT 20 RNA in 
120 mM sodium chloride at 95^0 for 5 min, held at 65°C for 1 
h and cooled to room temperature. The annealed primer was 
20 extended using 18 units of reverse transcriptase in 50mM 
Tris pH 8.3 at 45^0^ 6 mM magnesium acetate, 10 mM 
dithiothreitol ,1.8 mM dNTPs in a total volume of 50 /xl at 
45^e for Ih. The reaction was stopped by the addition of 50 
mM EDTA and the RNA digested by treatment with RNase-A at 
25 400/ig/ml for 15 min at 37^0. The samples were than 

phenol: chloroform extracted prior to ethanol precipitation 
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and electrophoresed on a standard 6% sequencing gel yielding 
two bands which mapped to two C»s/ 72 and 71 bases upstream 
of the ATG. The sequencing ladder was single-stra 
control DNA (M13mpl8) from the Sequenase kit (US Biochemical 
5 Corp. ) . 

The most prominent product was 72 bp, ec[ual to the 
number of base pairs from the 5V end of the bligohucleotide 
primier to the 5' end of the PCR-derived clone, thus 
confirming that the cDNA represents the entire length of its 

10 corresponding cellular mlWA 5 * to the tandem ^r The 
presence of a second band may be due to interference with 
reverse transcriptase by methylation of the C at base 71, 
since it forms a CpG dinucleotide. Under identical 
conditions; no primer extension product was seem using RNA 

15 from Daudi cells which do not express the PEM mucin. 

Clonihcr 



A plasmid library , grown in DHlacells (RecA-) , was 
used instead of a lambda library, because of the possibility 

20 of recombination occurring when lambda is grown in RecA+ 

cells. This recombination might have been escpected, since 
a part of the tandem repeat sequence (GGTGGGGG) is closely 
related to the chi setjuence (GCTGGTGG) of lani^a phage which 
has been implicated as 4 hotspot for RecA-mediated 

25 recombination in E.coli. 
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Nucleotide sequence of gDNA clones 

Fig 1 . shows the DNA setjuenee from the 5 ^ 
A-PCR-^derived clone, including the consensus sequence of the 
tandem repeat. Sequences wer^^ 
5 directions. The region of conserved tandem repeats was not 
sequenced in full, although a cDNA tamdem riepeat clone 
obtained previously had been circularised, sonicated and 
about 40 clones sequences [ (Gendler et al. / J. Biolv Chem. , 
263:12820-12823 (1988)3. 
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Predicted amino acid secmence and composition of the PEM 
core protein . 

The cpire protein acid ccnnposition is dominated 

5 by the aaniiio acid composition of the taiKlem repeat* Serine, 
threonine, proline, alanine and glycine account for about 
60% of the amino acids. 

The deduced sequence of the PEM core protein consists 
of distinct regions including (1) the N^t erminal region 

10 containing a hydrophobic signal sequence and degenerate 
tandem repeats and (2) the tandem repeat region itself. 
At the N-^terminus a putative signal peptide of 13 amino 
acids follows the first 7 amino acids , However , the actual 
site of cleavage has not been detem to 

15 obtain N-terminal sequence of the core protein were hindered 
by a blocked amino terminus. Following the signal sequence 
and preceding the first SmaX site (^ich is used to define 
the beginning of the tandem repeat region) are 107 emino 
acids. Greater than 50% 6f these amino acids comprise 

20 degenerate tandem repeats. Since the number of tandem 
repeats per molecule is large (greater than 21 for the 
smallest allele we have pbseryed) , this domain forms the 
major part of the core protein, and results in a highly 
repetitive structtire which is extremely immunogenic 

25 TGendler/ S. et al, / Idd. cit] . The sequence of the 20 
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amino acid tandem repeat unit corresponds to what might be 
expected for a protein which is extensively 0-glycosylated, 
Five serines and threbnines^ four of which are in doublets, 
are found in the re^)eat and these potential 
5 sites are separated by regions rich in prolines (See Fig. 
2): 
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CLAIMS 



1. A nucleic acid fragment comprising a portion of at 
1 east 17 contiguous nucieot ide bases which por t i on has a 
sequence the same as> or homolo to a portion of 
5 coirrespondihg length of the sequence of the coding strand as 
set out ijii^ M or the same as, or homologous to a portion 
of correspbriding length of the sequence complementary to the 
sequence of 1:::he coding strand 

2* A fragment accbrdihg to claim l comprising any one or 

10 more of the following: 

(a) a signal sequence 
TTCCTGCTGCTGCTCCTCACAGTGCTTACAGXTTGTT 
wherein X is an optionally present intron 

(b) a mamaary consensus sequence AGGCTAAAACTAGACC 
15 (c) a mammary consensus sequence GTAAGAATTGCAGACA 

(d) a homologue of a sequence (a) / (b) or (c) and 

(e) a sequence complementary to a sequence (a) , (b) , (c) 
or (d). 



3. A hybridisation probe comprising a fragment according 

20 to claim 1 or claim 2 bearing a detectable label or linked to 
a solid support. 



4 . A cloning or expression vector comprising a fragment 
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according to claiift 1 or claim 2 . 



5. A transformed ceil comprising a cloning or expression 
vector accordihgr to claim 4 . 

6. A polypeptide comprising a sequence of at least 5 

5 contiguous acid residues encoded by the coding portion of the 
DNA sequence as indicated in Fig. 2. 

7. An antibody against a polypeptide according to claim 

8. An antibody according to claim 7 bearing a detectable 
10 label or linked to a solid support. 

9. An antibody-producing cell capable of secreting an 
antibody according to claim 7/ 

10. A diagnostic kit comprising a fragment according to 
claim 1 or claim 2 or a probe according to claim 3 or a 

15 polypeptide according to claim 6 or an antibody according to 
claim 7 or claim 8. 



11. A fragment according to claim 1 or claim 2 or a probe 
according to claim 3 or a vector according to claim 4 or a 
cell according to claim 5 or claim 9 or a polypeptide 
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accGrding to claim 6 or an antibody according to claim 7 or 
claim 8 f or use in a method of treatment or diagnosis 
practised on the human or animal body. 

12 • Use of a fragment according to claim 1 or claim 2 or 
5 a probe according to claim 3 or a vector according to claim 4 
or a cell according to claim 5 or claim 9 or a polypeptide 
according to claim 6 or ah antibody according to claim 7 or 
claim 8 in the preparation of a medicament for use in a 
method of treatment or diagnosis practised on the human or 
10 animal body. 

13 • A method of treatment or diagnosis comprising 
administering to a cancer patient in need thereof or 
suspected to have a cancer an effective non-toxic amount of a 
fragment according to claim 1 or claim 2 or a probe according 
15 to claim 3 or a vector according to claim 4 or a cell 

according to claim 5 or claim 9 or a polypeptide according to 
claim 6 or an antibody according to claim 7 or claim 8. 

14 • A method of diagnosis comprising contacting a sample 
from a patient with a fragment according to claim 1 or claim 
20 2 or a probe according to claim 3 or a vector according to 
claim 4 or a cell according to claim 5 or claim 9 or a 
polypeptide according to claim 6 or an antibody according to 
claim 7 ojr claim 8. 
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